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(57) Abstract: Disclosed herein 
is a method of analyzing three-di- 
mensional data obtainedfrom a 
chromatography/spectrometry process, 
in particular an LC/MS process using 
a two-dimensional multi-variant 
statistical analysis. The LC portion 
permits separation of analytes within a 
sample. The information obtained from 
such a procedure typically depends 
on retention time (R). As the analytes 
progress through the system, they 
enter the MS region of the LC/MS 
system. There they are ionized and 
a mass detector then detects these 
ionized species. The information 
procured from this procedure is 
generally reported as intensity for a 
corresponding m/z. value. Therefore, 
an LC/MS system provides at least 
three pieces of information. Principle 

component analysis (PCA) is a robust method of multivariant analysis of this type of data between different samples. However, 
typically, PCA analysis is performed using only two-dimensional data. Therefore, it is expected that at least one parameter of data 
obtained from a LC/MS operation is lost, however, the methods disclosed herein preserves all three mathematical dimensions. 
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A METHOD OF USING DATA BINNING IN THE ANALYSIS OF 
CHROMATOGRAPHY/SPECTROMETRY DATA 

5 CROSS REFERENCE TO RELATED APPLICATIONS 



This application claims benefit of U.S. Provisional Application No. 
60/384,7 12 filed May 3 1 , 2002 (Attorney Docket No. WAA-306) the content of 
10 which is expressly incorporated herein by reference in its entirety. 



STATEMENT ON FEDERALLY SPONSORED RESEARCH 

N/A 



15 FIELD OF THE INVENTION 

The present invention relates to a method of analysis of multi-dimensional 
data by an analysis method applicable to fewer dimensions by the application of 
data binning to the data. In particular, the present invention relates to a method for 
the analysis of chromatography/spectrometry data using data binning. The 

20 invention has specific application to methods of doing liquid chromatography/mass 
spectrometry (LC/MS) data analysis employing data binning combined with 
principle component analysis. 

BACKGROUND OF THE INVENTION 

25 Metabonomics is a rapidly growing area of scientific research. It is a 

systems approach for studying in vivo metabolic profiles and can provide 
information on a particular disease, toxicity, and gene function. In metabonomics, 
the effect of a pharmaceutical candidate on a whole animal or organism is 
investigated by studying the changes in metabolism over a time course following 

30 compound administration. The analytical data generated in these studies is 

analyzed by multi-variant mathematical techniques such as Principle Component 
Analysis (hereinafter "PCA")- This mathematical technique is employed to 
highlight both subtle and gross differences in the samples being examined. 

35 To date, the vast majority of work in this field has utilized proton-nuclear 

magnetic resonance (NMR) as the analytical method of choice. While being very 
effective, NMR has several disadvantages, namely, poor sensitivity, time 
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consuming analysis, and the non-detection of some chemical classes, e.g., sulfates. 
A further draw back to NMR is that as all of the signals are contained in one 
spectrum it is easy for one large compound to mask subtle but important changes in 
5 a low concentration analyte. Further, the necessary removal of xenobiotic-related 
compounds from the NMR spectrum also removes signals from endogenous 
compounds of interest thus reducing the data set used for subsequent PCA analysis. 



Chromatography, both gaseous and liquid, combined with spectrometric 

10 analysis such as ultraviolet spectroscopy, infrared spectroscopy, nuclear magnetic 
resonance or mass spectrometry, has evolved into a powerful technique, 
Electrospray mass spectrometry coupled with liquid chromatography (LC/MS) has 
become the technique of choice for bioanalysis* both quantitative and qualitative. 
The technique is robust, sensitive, and selective with sensitivities up to the pg/mL 

1 5 range readily achieved. The use of short columns and rapid gradients has also 
allowed LC/MS to achieve analytical cycle times in the region of one sample per 
minute with good chromatographic resolution and sensitivity. Thus the application 
of LC/MS to metabonomics is a logical step, While the mass spectrometer will 
give both high sensitivity quantitation and structural information, the 

20 chromatography step will address the issue of overlapping signals by separating out 
the analyte giving a time resolution. The use of LC/MS is preferred over mass 
spectrometry infusion as the chromatography step reduces ion suppression by 
reducing the number of competing ions going into the mass spectrometer at any 
given time. Another advantage of LC/MS over NMR is observed in data analysis. 

25 When removing xenobiotic-related material only a small time slice at one or two 
specific masses will be removed from the data set. Therefore, the remaining 
LC/MS data is left unaltered and available for mathematical analysis, e.g., PCA. 



Principle component analysis is a very effective mathematical device for 
30 analyzing the data obtained from MS. However, PCA is a two-dimensional 

technique whereas data obtained from LC/MS is three-dimensional. Currently, 
there exists a need to enable PCA to maintain the chromatographic separation 
information that is normally lost in conventional PCA analysis. 



WO 03/102543 PCT/US03/17190 

3 

In general, there are many experimental conditions where the information 
about the result's is contained in data that is characterized by multiple parameters. 
When the interpretive tools cannot handle these parameters, information is lost A 
5 methodology that improves the resolution of analysis of LC/MS data that is 
analyzed by the PCA process is applicable to other data sets. 



SUMMARY OF THE INVENTION 

This invention pertains to a method of analyzing multi-dimensional data 

10 using a tool that handles fewer dimensions. In a particular embodiment, this 
invention pertains to a method of analyzing the three-dimensional data obtained 
from a chromatography/spectrometry run, such as an LC/MS run using a two- 
dimensional multi-variant statistical analysis technique such as PCA, Typically, 
PCA analysis is performed using only two-dimensional data. In the present 

15 invention, the anaylsis is performed using three-dimensional data. 

Data obtained from a chromatography operation is characterized by a 
number of parameters. In particular, the chromatography separates the sample as a 
function of time. Retention time, the length of time a component of the sample is 
20 retained on the solid phase is particularly informative. Different components of the 
sample elute from the solid phase at different times depending on the conditions. 
This time dependant characteristic is useful in differentiating particular components 
of the sample. 

25 Further parameters may be specific to the detection method used after the 

chromatography separation. For instance, in fluorescence detection the parameters 
of interest are wavelength and intensity of light, while in ultraviolet detection, a 
parameter of interest is the amount of absorbed light. In many cases, the greater the 
number of parameters an analysis method can handle, the more discriminating the 

30 result. When one parameter must be ignored entirely, the opportunity to 
discriminate between data points that differ only by that parameter is lost. 

Data obtained from an LC/MS operation consists of three parameters. The 
first of these parameters is specific to the LC portion of the system. Liquid 
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chromatography separates analytes as a function of time. Therefore, the data 
obtained from the LC portion is time-based, Le., retention time (R t ). Different 
analytes will elute from the solid phase element of the LC system (the column) at 
5 different times depending upon the eluant, eta This differential separation allows 
for a R t characteristic for each analyte. This information can be useful in 
determining, for example, the identity of a particular analyte. 



The second and third parameter is specific to the MS portion of the LC/MS 
10 system. Once a sample is separated via LC, individual analytes enter the MS 

portion of the system. Depending upon the conditions of the MS, the analyte will 
be ionized within a given volume and then be accelerated to a molecular mass 
detector. The mass-to-charge ratio (m/z) and intensity can be determined following 
mass detection. This information is used to determine differences between two or 
15 more samples. 

Typically, PCA analysis is used to perform this determination using only 
the m/z and intensity parameters. Thus, the R t data does not contribute to the 
overall analysis. This tradeoff has been accepted because of the abundance of data 

20 available from the mass spectrometer, with approximately 5 to 10 AMU points per 
ion peak. The present invention addresses this failure by incorporating data binning. 
With data binning, the R t information is not lost in the PCA analysis, thus 
facilitating three-dimensional data analysis. Comparison between samples now 
includes R t , m/z, and intensity data. This technique is especially useful in 

25 surveying the results of a complex sample, where the answer to a simple question 
must be found in masses of complex data. Hence, subtle changes between samples 
heretofore undetectable can now be elucidated. 



In one embodiment, individual LC/MS chromatograms are divided into 
30 predetermined fractions (or bins). These bins reflect the R t parameter. The size of 
the bin is determined by the operator. For example, the bin size can be one minute, 
reflecting one minute intervals along the chromatogram. Within each bin, the 
intensity information for all the scans performed during the bin time are summed 
for each species (m/z )present during the corresponding R t . The bin, or more 
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precisely the data contained within it, is then transformed and exported to a data 
matrix such as a commercial spreadsheet, for example, BxceL Therefore, the 
original graphical data is transformed into a tabular form of data. The information 
5 within each bin that is transformed are chromatographic time, m/z and intensity. 
More than one bin can undergo this transformation. For example, there can be sixty 
bins representing one bin per minute for a sixty minute LC procedure* Alternately, 
bins may be organized in an overlapping fashion so that for instance, ninety one- 
minute bins originating once per 40 seconds are used to analyze a sixty minute LC 

10 procedure. Once the transformation is complete, the data sets (data within the table) 
are aligned with zeros. Alignment of the data sets simply refers to a gap filling 
procedure. For example, if in bin 3 at m/z=100 there is an intensity value of 102 
and in bin 4 at m/z^lOO there is no intensity value, then a zero is placed in bin 4 at 
the intensity position corresponding to m/z=100. In other words, the zero serves as 

15 a place holder, thus allowing for proper data alignment. Hence, each data set has 
one intensity value (including zero) for each m/z. Additionally, isotope abundance 
maybe combined, for example, combining values for C 12 and its isotope C u - 
Finally, the spreadsheet files are transferred to a commercial PC A system, for 
example, MatLab, for PCA analysis. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic showing the steps involved in an embodiment of the 
present invention; 

FIG. 2 is a comparison of negative ion BPI LC/MS chromatograms for 
control and dosed rats; 

FIG. 3 is a comparison of negative ion BPI LC/MS urine chromatograms 
from rats dosed with compound A at time point 1 and time point 2; 

FIG. 4 is a comparison of negative ion BPI LC/MS urine chromatograms 
from rats dosed with compound B at time point 1 and time point 2; 

FIG. 5 is a comparison of negative ion BPI LC/MS urine chromatograms 
from rats dosed with compound C at time point 1 and time point 2; 

FIG, 6 (a) is an extracted ion chromatogram, and (b) is an MS spectrum for 
peaks 338 m/z; 
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FIG. 7 is a PC A plot of an entire data set consisting of rat urine samples 1- 

24; 

FIG: 8 (#) is a combined and overlaid mass spectra, and (b) is a PCA plot of 
5 fraction from 5 minutes to 6 minutes; and 

FIG. 9 (a) is a combined and overlaid mass spectra, and (b) a PCA plot of 
fraction from 3 minutes to 4 minutes. 



DETAILED DESCRIPTION 

10 This invention pertains to a method of analyzing multi-dimensional data 

using a tool that handles fewer dimensions. This invention has particular 
applicability to analysis of the data from a chromatography procedure followed by a 
spectrometric procedure. The data set in these cases has a time component, as well 
as typically an intensity value for a physical or chemical characteristic of the 

15 sample. In a particular embodiment, this invention pertains to a method of 

analyzing data obtained from a Liquid Chromatography (LC) /Mass Spectrometry 
(MS) procedure using Principle Component Analysis (PCA). Typically, PCA 
analysis is performed using only two-dimensional data. In the present invention, 
the analysis is performed using three-dimensional data. The detailed description of 

20 the LC/MS and PCA analysis illustrates the methodology of the general case as 
well as this specific embodiment. 



The chromatography portion of the system permits separation of analytes 
within a sample matrix. The information obtained from such a procedure is 

25 typically reported as retention time (R t ). As the analytes progress through the 

system, they enter the MS region of the LC/MS system. There they are ionized and 
a mass detector then detects these ionized species providing m/z and intensity 
information. Therefore, an LC/MS system can furnish at least three pieces of 
information. While many multi-variant statistical analysis techniques can be used, 

30 principle component analysis (PCA) is a robust method of analyzing data and is 

quite useful when making comparisons between different samples. However, PCA 
analysis is typically performed using only a two-dimensional data set. Therefore, it 
is expected that at least one parameter of data obtained from a LC/MS operation is 
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lost. In the present invention, the analysis is performed using three-dimensional 
data sets along with PCA analysis. 



5 A commonly employed method to separate analytes in a sample matrix is 

chromatography, in particular, liquid chromatography (LC) although gas 
chromatography (GC) provides similar data for sample that are gaseous. High 
Performance Liquid Chromatography (hereinafter "HPLC) allows for fast and 
efficient separation and characterization of analytes within a given sample, 

10 Components of an HPLC system include high pressure pumps which facilitate the 
movement of an aqueous and/or organic phase through the system. This aqueous 
phase (or mobile phase) comprises a solvent that is used to initially equilibrate the 
HPLC system. The solvent also provides an aqueous milieu for analytes to traverse 
through the entire HPLC system. Finally, the mobile phase comprises solvent 

15 which elutes analytes from an HPLC column. 

Another component of an HPLC system is the chromatography separations 
column. The column comprises a solid phase. This solid phase, in combination 
with the mobile phase, effectuates differential separation of analytes contained 
20 within a sample matrix. The solid phase generally consists of chemical polymers 
that interact with a certain class of analytes. For example, a reverse-phase HPLC 
column has a solid phase chemistry (e.g., a hydrocarbon chain consisting of 
eighteen carbons attached to a silica bead) that interacts with analytes via 
hydrophobic forces. 

25 

Once the analytes are eluted from the column, they traverse into and 
through a detector. There are a variety of detection systems that can be employed 
in an HPLC system. For example, there are ultra-violet ("UV") detectors that 
detect analytes within the UV range, infrared (IR) detectors that detect analytes 
30 within the IR range, electro-chemical detectors that detect analytes using 

voltametric techniques, nuclear magnetic resonance detectors that detect analytes 
based on their response to a magnetic field, and mass spectrometer detectors that 
detect analytes based on the mass to charge ratio. 
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In an LC/MS system, the sample matrix is first subjected to LC separation, 
thereby effectuating the separation of analytes within the sample. This information, 
often conveyed by an analyte's R t? can be used for characterizing and identifying 
5 the analyte. However, further information can be obtained, especially for 

comparisons made between different samples, by subjecting the analytes to MS 
detection. In general, mass spectrometers ionize molecules within a given volume 
and then accelerate the ionized molecules to a molecular mass detector. Ionization 
of a molecule can occur through electron-ionization, chemical ionization, 
10 electrospray ionization, or photoionization. The ionization process can occur under 
various pressure conditions, including atmospheric pressure. Information obtained 
from MS includes the intensity of response for a particular m/z value. It is this 
intensity (as well as m/z) information that is important when comparing two or 
more samples. 

The MS information can be harnessed for comparative analysis. For 
example, assume that one is experimenting with a human cell line, The goal of the 
study is to determine what if any effect compound X has on a particular metabolite. 
To continue, assume that the operator has obtained two samples, sample 1, a control 
sample, z.e., cell extract from a cell culture in which no compound X was added, 
and sample 2, a cell extract taken from a cell culture system in which compound X 
was added. These two samples can be subjected to LC/MS. Let us further assume 
that the metabolite of interest is amenable to chromatographic isolation using a 
reverse-phase column. Once separated via LC, the samples, sample 1 and sample 2 
independently, can be subjected to MS analysis. The MS signals generated from 
the two samples can then be compared, particularly for the metabolite of interest to 
determine if its intensity increases, decreases, or remains unaffected by the addition 
of compound X. 

Figure 1 shows a flow chart depicting the various steps of the present 
invention as applied to a LC/MS experiment. Figure 1 a depicts in a graphical 
format data obtained from an LC/MS procedure on a fictitious sample. As shown, 
the graph defines three parameters. The X-axis defines the m/z parameter, for 
example, the X-axis can extend from 100 to 800 m/z in 0.1 m/z steps. The Y-axis 
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of FIG, la represents the intensity or response parameter. The intensity can be set 
as a percentage of a certain ion, e.g., the base peak. Finally, there is a Z-axis that 
represents the time parameter. Depicted in FIG. la are ion species differing in at 
5 least one if not all three parameters. Yet each ion species is uniquely defined by the 
parameters in FIG. la. 



Extant PCA analysis requires that the data represented in FIG. la be 
converted into a two-dimensional data set prior to analysis. For example, 

10 conventional PCA analysis can be performed using the m/z and intensity 

parameters. However, this would eliminate the time parameter. A more thorough 
analysis would include all relevant information available including the time 
dimension. Figure lb represents a process whereby all three parameters can be 
conserved and subjected to analysis. Data binning is a process of separating one 

15 data set representing multiple measurement points into multiple data sets that 

aggregate measurement points. A bin corresponds to a range of one parameter, the 
bin parameter* The variation of the bin parameter within each bin is eliminated such 
as by accumulating all data points within the bin at the bin value. Note that a bin 
that encompassed the entire data set has the same effect as ignoring the bin 

20 parameter. While it is most common to utilize disjoint bins, overlapping bins may 
be used for some analysis. In transitioning from FIG. la to FIG. lb, a process of 
data binning occurs wherein a predetermined bin size is selected. The bin size is a 
function of the time dimension. For example, the bin size can be two minutes. 
Multiple bins are shown in FIG. lb. Each bin contains both m/z and intensity 

25 information for ion species found in all the scans falling within that bin. The bin 
(or region) sizes are selected by a practitioner based on experience and the sample 
being tested. Selection methods are well known to those skilled in the art. For 
example, MassLynx software can be employed. Using this software, a bin is 
collected as follows: (1) open the total ion chromatogram; (2) select a bin region by 

30 right mousing and dragging over the bin area, a combined mass spectrum will 
appear (the bin); (3) in the combined mass spectrum (bin) window, a "List 
Spectrum" option will appear and is activated; (4) as List Spectrum is chosen, a 
window displays m/z with its corresponding abundances in a two column format; 
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(5) copy the data contained within the two columns; and (6) paste into a spreadsheet 
like ExceL 



5 Following the data binning process, a transformation step occurs. (See FIG. 

lb to FIG* 1c). This transformation converts the separate lists derived from the 
graphical data format into a matrix data representation. A practitioner can begin 
this transformation by simply employing a "cut and paste" procedure, delivering the 
two column data set information to a commercial spreadsheet like Excel. Once this 

10 part of the transformation is complete, the data set needs to be aligned with zeros. 
Each bin must have a value of all the ions to be analyzed. In order to assure analysis 
fidelity, zeros are inserted as intensity values where no ion signal was observed for 
that bin, resulting in each data set having one intensity value for each m/z 
parameter. Then the m/z parameter columns can be melded into one column that 

15 acts as one axis of the matrix. Alternatively, a software program can be used to 

automate this transformation. A depiction of the tabular representation is shown in 
FIG. 1c. In FIG. lc there is a m/z column and a bin row. The m/z column contains 
the m/z ions to be analyzed. The bin row contains the desired bins to be analyzed. 
The matrix intersection contains intensity values corresponding to proper m/z and 

20 bin parameters. Further data manipulation, based on known transformations in m/z 
and intensity caused by the test, may be done by the operator before the data is 
subjected to analysis. For instance, isotope abundance can be combined so that the 
values for, for instance, C 12 and its isotope C 13 are combined. 



25 The matrix or spreadsheet is then transferred and subjected to a commercial 

PCA program where comparisons between various samples can be performed. One 
such commercial PCA program is MatLab available from Math Works, Natick, MA. 
Another PCA package can be obtained from InfoMetrix, located in Woodinville, 
WA, and is called Pirouette,. 

30 

By blurring some of the mass spectrometer's inherent high resolution by 
combining the data resulting from multiple scans via binning, the two-dimensional 
analysis can handle the resulting three dimensional data. This allows the mass of 
data to be analyzed where a change between specimens is expected but the location 
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of the change in the various scans is not known. After the location of the variation 
is known, later analysis can used the high resolution inherent in the raw data to 
further characterize the differences. The binning technique can be applied to adapt 
5 data sets to many types of multi-variant statistical analysis. 

The features and other details of the invention will now be more particularly 
described and pointed out in the following example. It will be understood that the 
particular embodiments of the invention are shown by way of illustration and not as 
10 limitations of the invention. The principle features of this invention can be 

employed in various embodiments without departing from the spirit and scope of 
the invention. 



EXAMPLE 

15 

Rat urine samples: 

A total of 20 rat samples were used for this experiment. The rats were 
divided into four (4) groups. Group I received the dosing matrix alone, Group H, 
III, and IV were orally dosed with compounds A, B, and C, respectively. Samples 
20 of rat urine were collected at two time points, 0-8 and 8-24 hours after dosing. The 
rat urine was stored frozen at -20°C prior to analysis. 



Chromatography: 

The chromatography was performed using a Waters Alliance 2795 HT 
25 equipped with a column oven and a Waters 2996 PDA detector. A 20 jiL/min 

injection of rat urine (diluted 1 :4 with distilled water) was made onto a 10 cm x 2. 1 
mm Waters Symmetry C]g 3,5 jam column. The column was maintained at 40°C 
and eluted under gradient conditions at a flow rate of 600 ^L/min, where eluant 
"A** was 0. 1% aqueous formic acid, and eluant "B" was acetonitrile. The column 
30 was maintained at 100% A for the first minute, then the mobile phase was rapidly 
increased to 30% B over the next 7 minutes. The mobile phase was then rapidly 
increased to 95% B over 0.1 minutes. This mobile phase was maintained for 0,5 
minutes in order to wash the column and then returned to the initial conditions over 
0.1 minutes. The next injection was made 10 minutes after the first injection. 



35 
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Mass spectrometry: 

The mass spectrometric analysis was performed on a MicroMass Quattro 
Micro equipped with an electrospray interface ("ESI"). The instrument was 
5 operated in negative ion mode with a capillary voltage of 3 kV, the cone voltage 
was set to 25 Volts, the nebulizer gas was set at 600 L/hr with a desolvation 
temperature of 150 C and a source temperature of 70°C, The column effluent was 
split in a ratio of 5:1 , such that 

100 ^L/min entered the mass spectrometer source. The instrument was operated in 
10 full scan mode, scanning from 1 00-800 m/z with a scan time of 200 msec and an 
interscan delay time of 50 msec. Data was collected from 0-10 minutes. The initial 
portion of the chromatogram was not directed to waste. 



Data analysis: . 

15 The LC/MS chromatograms were inspected for the presence of any drug- 

related products such as Phase I and Phase II metabolites. Each individual LC/MS 
chromatogram was then divided into 10 fractions of 100 scans. The MS signal was 
combined for each individual chromatogram fraction of 100 scans. The peaklist 
and ion abundances were then exported to an Excel file. The data sets were then 

20 aligned with zeros being inserted as intensity values where no ion signal was 
observed, such that each data set had one intensity value for each m/z integer 
between 100 and 800. The Excel files were then transferred to MatLab for PCA 
analysis using an in house program. 

25 Discussion: 

Rat urine potentially contains several thousand components most of which 
remain unknown. The vast majority of these components are of moderate or high 
polarity, hence, it was only necessary to employ a gradient from 0-30% organic 
content to effect complete elution of all of the components in the urine samples. 
30 Negative ion ESI MS was chosen as the mode of MS detection as it gave a more 
information rich data set than positive ion ESL The chromatography column 
length, gradient time, and effluent flow rate employed in this study was chosen to 
give the best balance between sample throughput and chromatographic resolution. 
It provided a sample throughput of 6 samples per hour and a peak capacity of 78. 
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While this peak capacity is not very high, the use of mass spectrometry scanning 
from 100-800 m/z provides a much greater effective peak capacity. In this study, 
we have employed a simple reverse-phase chromatography system. It is fully 
5 appreciated that many highly polar compounds, such as amino acids and sugars, 
will not be retained. Modifying the system permits elucidation of these 
compounds, 

A preliminary comparison of the base peak intensity ("BPF) 
10 chromatograms of the control rat urine and urine samples from the dosed animals at 
time point 1 reveals a qualitative difference, see FIG. 2. As can be observed from 
these chromatograms, there are several peaks in the dosed sample group 
chromatograms that show elevated levels when compared to those in the control 
group. These peaks elute in the 5-9 minutes time region of the chromatogram. A 
15 thorough investigation of these peaks confirmed that none of them were metabolites 
resulting from the dosed compounds. Therefore, these peaks must be due to a 
change in the metabolic state of the animals. 

The BPI chromatograms obtained from the time point 2 samples for 
20 compound "A" shows some subtle changes in some peak intensities when 

compared to the time point 1 samples, see FIG. 3. In the time point 2 sample, the 
analytes eluting between 5-7 minutes seems to have reduced relative concentration 
compared to those same analytes in time point 1 . The samples for compounds "B" 
and "C" show less variation between time points 1 and 2, see FIGS. 4 and 5. As 
25 these time point 1 and 2 samples are day and night samples one would expect to see 
a significant variation between the two sample types even without the dosing of a 
xenobiotic. This implies that the compounds associated with the changes in animal 
metabolism between day and night were either not retained or not detected. 

30 The peaks identified to have changed in the dosed samples compared to the 

control are listed in Table 1 . (An extracted ion chromatogram of one of these ions 
at m/z=338 and the MS spectrum of this peak are given in FIG. 6.) 
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TABLE 1 



CoitiDOund dosed 


Analyte m/z value 


Relative change 


A 


283 


1 0 fold increase 


A 


461 


5 fold increase 


A' 


187 


10 fold increase 


B 


338 


2 fold decrease 


B 


283 


10 fold increase 


B 


461 


10 fold increase 


B 


187 


10 fold increase 


C 


283 


20 fold increase 


C 


187 


30 fold increase 



5 



The data in Table 1 illustrate the changes in peak intensity observed for 
these peaks in the control samples and the dosed samples. As can be observed from 
this data, the changes in the peak concentrations are very significant The fact that 
these compounds were also observed in the control samples confirms that these 
10 peaks were not metabolic products of the dosed compounds. 

The PCA analysis of the entire data set is presented in FIG. 7. In this figure, 
the data relating to the control samples are contained within the ringed areas* The 
control samples are clearly separated from the dosed animal samples. This PCA 
1 5 analysis was performed using the entire data set of retention time and m/z values. 
Fractionating the chromatogram into 10 time regions also allowed PCA analysis to 
be performed on specific time regions of the chromatogram. 

The PCA analysis of the data between 5 and 6 minutes is presented in FIG. 
20 8. Here it is possible to separate the control samples, contained inside the ringed 
area, from the dosed samples. The combined MS spectrum indicates that the MS 
peaks responsible for the mapping of the dosed and control samples to different 
regions of the PCA plot were m/z=283 and m/z^26L The m/z=283 ion has 
increased in intensity in the dosed samples whereas the m/z=261 ion has reduced in 
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intensity in the dosed samples. This result confirms the observations made by 
analyzing the LC/MS data, 

5 Figure 9 shows the PCA analysis of the LC/MS data between 3 and 4 

minutes. In this data, there is no separation between the control and dosed samples. 
This data indicates that in the region of the chromatogram there was no significant 
difference between the dosed and control samples. 

10 The application of LC/MS coupled to PCA data analysis has been 

successfully applied to the screening of rat urine following the administration of 
three candidate pharmaceuticals. With this methodology, it was possible to 
differentiate the control samples from the dosed samples. It was also possible, 
using PCA analysis, to identify the components of the MS spectrum responsible for 

15 the separation. This data clearly demonstrates that LC/MS is a viable alternative or 
even complementary to proton NMR for metabonomic applications in drug 
discovery and development. 



While this invention has been particularly shown and described with 
20 reference to embodiments thereof, it will be appreciated by those skilled in the art 
that various changes in form and detail may be made therein without departing 
from the spirit and scope of the invention as defined by the appended claims hereto. 
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What is claimed is: 



1 . A method of analyzing three-dimensional data derived from a sample using 
5 a two-dimensional analysis technique, comprising: 

(a) obtaining data for said sample, said data characterized by at least 
three parameters, at least one of said parameters correlated with time; 

(b) binning said data from step (a) forming a plurality of binned data 
sets* wherein a bin size is selected based on a time parameter and wherein 

10 said data points in said bin data sets are characterized utilizing two of the 

remaining parameters; 

(c) transfonning each said bin data set into an aligned bin data set, 
wherein alignment data points with a null second parameter are added to 
said bin data sets so that all bin data sets have the same number of data 

1 5 points and for each first parameter in said aligned data set at least one bin 

data set has a non-null second parameter; 

(d) forming a two-dimensional array by arranging said aligned bin data 
sets with said first parameter on one axis and a bin number on a second axis 
and said second parameter of the data points recorded at the intersections; 

20 and 

(e) subjecting said data from (d) to a two dimensional multi-variant 
statistical analysis. 

2. The method of claim 1 wherein said bins are disjoint. 

25 

3. The method of claim 1 wherein said bins are partially overlapping. 

4. The method of claim 1 wherein said bin size ranges from a bin that 
encompasses an entire range of said time parameter to a bin that encompasses no 

30 more than two smallest identifiable increments of said time parameter* 

5. The method of claim 1 wherein said two dimensional multi- variant 
statistical analysis is chosen from the set consisting of classical least squares 
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analysis (CSL), principle component analysis (PCA), partial least squares analysis 
(PLS) and partial least squares discriminant analysis (PLS-DA), 

5 6* The method of claim 5 wherein said two-dimensional multi-variant 
statistical analysis is principle component analysis (PCA). 

7. The method of claim 1 wherein the transformation of step c further includes 
merging data points when the values of said first parameter indicate said data points 

10 are related, said merging accomplished by adding the respective values of said 
second parameters. 

8. The method of claim 1 wherein said three-dimensional data is derived from 
a chromatographic process combined with a spectrometric process. 

15 

9. The method of claim 8 wherein said chromatographic process is a liquid 
chromatography process. 

10. The method of claim 8 wherein said spectrometric process is chosen from 
20 the set consisting of ultraviolet spectroscopy, infrared spectroscopy, mass 

spectrometry and nuclear magnetic resonance. 

1 1 . The method of claim 1 0 wherein said spectrometric process is mass 
spectrometry. 

25 

12. The method of claim 8 wherein the non-time based parameters are a 
physically accessible variable corresponding to a physical /chemical property of 
said sample and an intensity of said variable for each species of variable contained 
within each bin and said time parameter is a duration of R t . 

30 

13. The method of claim 12 wherein said physically accessible variable 
corresponding to a physical/chemical property of said sample is m/z. 
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14. The method of claim 13 wherein the transforation of step c further 
includes merging data points that have an m/z parameter indicating said data points 

5 represent isotopes of the same ion, said merging by adding respective intensity 
values. 

15. The method of claim 8 wherein said three-dimensional data is derived from 
a liquid-chromatography/mass spectrometry process. 

10 

16. A method of analyzing three-dimensional data using principle component 
analysis (PCA), comprising: 

(a) obtaining chromatography/mass spectrometry data for a sample, said 
data characterized by at least three parameters, wherein at least one of the 

15 parameters is selected from the set consisting of R t , m/z and intensity; 

(b) binning said data from step (a) forming a plurality of binned data 
sets, wherein a bin size is selected based on R< and wherein said data points in said 
bin data sets are characterized using one or more m/z and intensity values for each 
ion species contained within each bin; 

20 (c) transforming each said bin data set into an aligned bin data set, 

wherein alignment data points with a null intensity parameter are added to said bin 
data sets so that all bin data sets have the same number of data points and for each 
m/z value in said aligned data sets at least one aligned bin data set has a non-null 
intensity value associated with said m/z value; 

25 (d) forming a two dimensional array by arranging said aligned bin data 

sets with said m/z on one axis, a bin number on a second axis and an intensity value 
recorded at the intersections; and 

(e) subjecting said data from (d) to PC A analysis. 

30 17. The method of claim 16 wherein said chromatography/mass spectrometry 
data is derived from a liquid-chromatography/mass spectrometry process. 

1 8. The method of claim 1 6 wherein the transfonnation of step c further 
includes merging data points where m/z parameters indicate said data points 
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represent isotopes of the same ion, said merging by adding respective intensity 
values. 

5 19* The method of claim 16 wherein said two-dimensional array is formed in a 
spreadsheet. 

20. The method of claim 16 wherein said bins are disjoint. 

10 21. The method of claim 16 wherein said bins are partially overlapping. 

22. The method of claim 16 wherein said bin size ranges from a bin that 
encompasses an entire range of said time parameter to a bin that encompasses no 
more than two smallest identifiable increments of said time parameter. 



15 
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